Search CORE

67 research outputs found

A WCET-aware cache coloring technique for reducing interference in real-time systems

Author: Ballabriga Clément
Bouquillon Fabien
Lipari Giuseppe
Niar Smail
Publication venue: HAL CCSD
Publication date: 25/06/2019
Field of study

International audienceThe time predictability of a system is the condition to give safe and precise bounds on the worst-case execution time of real-time functionalities which are running on it. Commercial off-the-shelf(COTS) processors are increasingly used in embedded systems and contain shared cache memory. This component has a hard predictable behavior because its state depends on the execution history of the systems. To increase the predictability of COTS component we use cache coloring, a technique widely used to partition cache memory. Our main contribution is a WCET aware heuristic which partition task according to the needs of each task. Our experiments are made with CPLEX an ILP solver with random tasks set generated running on preemptive system scheduled with earliest deadline first(EDF)

HAL Descartes

Grassroots Operator Search for Model Edge Adaptation

Author: Benmeziane Hadjer
Maghraoui Kaoutar El
Niar Smail
Ouarnoughi Hamza
Publication venue
Publication date: 20/09/2023
Field of study

Hardware-aware Neural Architecture Search (HW-NAS) is increasingly being used to design efficient deep learning architectures. An efficient and flexible search space is crucial to the success of HW-NAS. Current approaches focus on designing a macro-architecture and searching for the architecture's hyperparameters based on a set of possible values. This approach is biased by the expertise of deep learning (DL) engineers and standard modeling approaches. In this paper, we present a Grassroots Operator Search (GOS) methodology. Our HW-NAS adapts a given model for edge devices by searching for efficient operator replacement. We express each operator as a set of mathematical instructions that capture its behavior. The mathematical instructions are then used as the basis for searching and selecting efficient replacement operators that maintain the accuracy of the original model while reducing computational complexity. Our approach is grassroots since it relies on the mathematical foundations to construct new and efficient operators for DL architectures. We demonstrate on various DL models, that our method consistently outperforms the original models on two edge devices, namely Redmi Note 7S and Raspberry Pi3, with a minimum of 2.2x speedup while maintaining high accuracy. Additionally, we showcase a use case of our GOS approach in pulse rate estimation on wristband devices, where we achieve state-of-the-art performance, while maintaining reduced computational complexity, demonstrating the effectiveness of our approach in practical applications

arXiv.org e-Print Archive

HyT-NAS: Hybrid Transformers Neural Architecture Search for Edge Devices

Author: Benmeziane Hadjer
Mecharbat Lotfi Abdelkrim
Niar Smail
Ouarnoughi Hamza
Publication venue
Publication date: 28/03/2023
Field of study

Vision Transformers have enabled recent attention-based Deep Learning (DL) architectures to achieve remarkable results in Computer Vision (CV) tasks. However, due to the extensive computational resources required, these architectures are rarely implemented on resource-constrained platforms. Current research investigates hybrid handcrafted convolution-based and attention-based models for CV tasks such as image classification and object detection. In this paper, we propose HyT-NAS, an efficient Hardware-aware Neural Architecture Search (HW-NAS) including hybrid architectures targeting vision tasks on tiny devices. HyT-NAS improves state-of-the-art HW-NAS by enriching the search space and enhancing the search strategy as well as the performance predictors. Our experiments show that HyT-NAS achieves a similar hypervolume with less than ~5x training evaluations. Our resulting architecture outperforms MLPerf MobileNetV1 by 6.3% accuracy improvement with 3.5x less number of parameters on Visual Wake Words.Comment: CODAI 2022 Workshop - Embedded System Week (ESWeek

arXiv.org e-Print Archive

Harmonic-NAS: Hardware-Aware Multimodal Neural Architecture Search on Resource-constrained Devices

Author: Bouzidi Halima
Ghebriout Mohamed Imed Eddine
Niar Smail
Ouarnoughi Hamza
Publication venue
Publication date: 28/09/2023
Field of study

The recent surge of interest surrounding Multimodal Neural Networks (MM-NN) is attributed to their ability to effectively process and integrate multiscale information from diverse data sources. MM-NNs extract and fuse features from multiple modalities using adequate unimodal backbones and specific fusion networks. Although this helps strengthen the multimodal information representation, designing such networks is labor-intensive. It requires tuning the architectural parameters of the unimodal backbones, choosing the fusing point, and selecting the operations for fusion. Furthermore, multimodality AI is emerging as a cutting-edge option in Internet of Things (IoT) systems where inference latency and energy consumption are critical metrics in addition to accuracy. In this paper, we propose Harmonic-NAS, a framework for the joint optimization of unimodal backbones and multimodal fusion networks with hardware awareness on resource-constrained devices. Harmonic-NAS involves a two-tier optimization approach for the unimodal backbone architectures and fusion strategy and operators. By incorporating the hardware dimension into the optimization, evaluation results on various devices and multimodal datasets have demonstrated the superiority of Harmonic-NAS over state-of-the-art approaches achieving up to 10.9% accuracy improvement, 1.91x latency reduction, and 2.14x energy efficiency gain.Comment: Accepted to the 15th Asian Conference on Machine Learning (ACML 2023

arXiv.org e-Print Archive

Performance Evaluation and Design Tradeoffs of On-Chip Interconnect Architectures

Author: Bakhouya Mohmed
El-Ghazawi Tarek
Gaber Jaafar
Niar Smail
Suboh Suboh
Publication venue: HAL CCSD
Publication date: 08/10/2010
Field of study

Network-on-Chip (NoC) has been proposed as an alternative to bus-based schemes to achieve high performance and scalability in System-on-Chip (SoC) design. Performance analysis and evaluation of on-chip interconnect architectures are widely based on simulations, which become computationally expensive, especially for large-scale NoCs. In this paper, a Network Calculusbased methodology is presented to analyze and evaluate the performance and cost metrics, such as latency and energy consumption. The 2D Mesh, Spidergong and WK-recursive on-chip interconnect architectures are analyzed using this methodology and results are compared with those produced using simulations. The values obtained by simulations and by analysis show similar trends in the same order of magnitude. Furthermore, WK outperforms the other on-chip interconnects in all considered metric

HAL - Lille 3

INRIA a CCSD electronic archive server

FLASH-RL: Federated Learning Addressing System and Static Heterogeneity using Reinforcement Learning

Author: Benmeziane Hadjer
Bouaziz Sofiane
Hamdad Leila
Imine Youcef
Niar Smail
Ouarnoughi Hamza
Publication venue
Publication date: 12/11/2023
Field of study

Federated Learning (FL) has emerged as a promising Machine Learning paradigm, enabling multiple users to collaboratively train a shared model while preserving their local data. To minimize computing and communication costs associated with parameter transfer, it is common practice in FL to select a subset of clients in each training round. This selection must consider both system and static heterogeneity. Therefore, we propose FLASH-RL, a framework that utilizes Double Deep QLearning (DDQL) to address both system and static heterogeneity in FL. FLASH-RL introduces a new reputation-based utility function to evaluate client contributions based on their current and past performances. Additionally, an adapted DDQL algorithm is proposed to expedite the learning process. Experimental results on MNIST and CIFAR-10 datasets have shown FLASH-RL's effectiveness in achieving a balanced trade-off between model performance and end-to-end latency against existing solutions. Indeed, FLASH-RL reduces latency by up to 24.83% compared to FedAVG and 24.67% compared to FAVOR. It also reduces the training rounds by up to 60.44% compared to FedAVG and +76% compared to FAVOR. In fall detection using the MobiAct dataset, FLASH-RL outperforms FedAVG by up to 2.82% in model's performance and reduces latency by up to 34.75%. Additionally, FLASH-RL achieves the target performance faster, with up to a 45.32% reduction in training rounds compared to FedAVG.Comment: Accepted in the 41st IEEE International Conference on Computer Design (ICCD 2023

arXiv.org e-Print Archive

An MDE Approach for Energy Consumption Estimation in MPSoC Design

Author: Ben Atitallah Rabie
Dekeyser Jean Luc
Jemai Abderrazak
Meftali Samy
Niar Smail
Trabelsi Chiraz
Publication venue: HAL CCSD
Publication date: 24/05/2010
Field of study

International audienceEnergy Consumption is a leading criterion to take into ac- count in the design of multiprocessor systems on chip (MP- SoC). In this paper, we present a solution to estimate the energy consumption early inMPSoC design in order to nd a good performance/energy trade-o in the design ow. This solution is based on the injection of consumption estimators between the hardware components during the co-simulation of a system at the CABA (Cycle Accurate Bit Accurate) level. These estimators are designed using a design frame- work and the corresponding SystemC code is automatically generated thanks to a model driven approach. Our solution oers an energy estimation framework without changing the IP(Intellectual Property)source codes, using standalone es- timation modules, which allows their reuse. The accuracy of this approach is checked by integrating the consumption estimation in the simulation of signicant applications

HAL - Lille 3

INRIA a CCSD electronic archive server

An Efficient Power Estimation Methodology for Complex RISC Processor-based Platforms

Author: Ben Atitallah Rabie
Dekeyser Jean-Luc
Niar Smail
Rethinagiri Santhosh Kumar
Senn Eric
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/05/2012
Field of study

International audienceIn this contribution, we propose an efficient power estima- tion methodology for complex RISC processor-based plat- forms. In this methodology, the Functional Level Power Analysis (FLPA) is used to set up generic power models for the different parts of the system. Then, a simulation framework based on virtual platform is developed to evalu- ate accurately the activities used in the related power mod- els. The combination of the two parts above leads to a het- erogeneous power estimation that gives a better trade-off be- tween accuracy and speed. The usefulness and effectiveness of our proposed methodology is validated through ARM9 and ARM CortexA8 processor designed respectively around the OMAP5912 and OMAP3530 boards. This efficiency and the accuracy of our proposed methodology is evaluated by using a variety of basic programs to complete media bench- marks. Estimated power values are compared to real board measurements for the both ARM940T and ARM CortexA8 architectures. Our obtained power estimation results pro- vide less than 3% of error for ARM940T processor, 3.5% for ARM CortexA8 processor-based system and 1x faster compared to the state-of-the-art power estimation tools

HAL - Lille 3

INRIA a CCSD electronic archive server

HAL-Université de Bretagne Occidentale

System-Level Power Estimation Methodology for MPSoC based Platforms

Author: BEN ATITALLAH Rabie
DEKEYSER Jean-Luc
NIAR Smail
RETHINAGIRI Santhosh Kumar
Publication venue
Publication date: 01/01/2013
Field of study

Avec l'essor des nouvelles technologies d'intégration sur silicium submicroniques, la consommation de puissance dans les systèmes sur puce multiprocesseur (MPSoC) est devenue un facteur primordial au niveau du flot de conception. La prise en considération de ce facteur clé dès les premières phases de conception, joue un rôle primordial puisqu'elle permet d'augmenter la fiabilité des composants et de réduire le temps d'arrivée sur le marché du produit final.Shifting the design entry point up to the system-level is the most important countermeasure adopted to manage the increasing complexity of Multiprocessor System on Chip (MPSoC). The reason is that decisions taken at this level, early in the design cycle, have the greatest impact on the final design in terms of power and energy efficiency. However, taking decisions at this level is very difficult, since the design space is extremely wide and it has so far been mostly a manual activity. Efficient system-level power estimation tools are therefore necessary to enable proper Design Space Exploration (DSE) based on power/energy and timing.VALENCIENNES-Bib. électronique (596069901) / SudocSudocFranceF

OpenGrey Repository

A survey of cross-layer power-reliability tradeoffs in multi and many core systems-on-chip

Author: Abdallah
Ahmed A. Eltawil
Amin Khajeh Djahromi
Bhavnagarwala
Bhavnagarwala
Bibiche Geuskens
Bolchini
Bowman
Calhoun
Chang
Dennard
Fadi J. Kurdahi
Joshi
Kuhn
Kurdahi
Mazen A.R. Saghir
Michael Engel
Mukhopadhyay
Nalam
Noble
Noble
Peter Marwedel
Smail Niar
Stolk
Uht
Wang
Yamauchi
Zhai
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref